智能论文笔记

Lexicon-based Methods vs. BERT for Text Sentiment Analysis

Anastasia Kotelnikova , Danil Paschenko , Klavdiya Bochenina , Evgeny Kotelnikov

分类：自然语言处理

2021-11-19

近年来，情感分析方法的表现大大增加。这是由于基于变压器架构的各种模型，特别是伯特。然而，深度神经网络模型难以训练和可解释不佳。一种替代方法是使用情绪词典的基于规则的方法。它们快速，不需要培训，并被解释得很好。但最近，由于深入学习的广泛使用，基于词汇的方法已经退出了背景。本文的目的是研究SO-CAL和Sentistrength Lexicon的方法，适用于俄语。我们已经测试了这些方法，以及rubert神经网络模型，16个文本语料库，并分析了他们的结果。Rubert平均优于基于词汇的方法，但So-Cal超过了16个Corea的Rubert超过16。

translated by 谷歌翻译

Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics

Viktor Sanca , Anastasia Ailamaki

分类：人工智能 | 机器学习

2022-12-14

As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process where such formats are unsuitable for RDBMS. To tap into the dark data, domain experts analyze and extract insights and integrate them into the data repositories. This process can involve out-of-DBMS, ad-hoc analysis, and processing resulting in ETL, engineering effort, and suboptimal performance. While AI systems based on ML models can automate the analysis process, they often further generate context-rich answers. Using multiple sources of truth, for either training the models or in the form of knowledge bases, further exacerbates the problem of consolidating the data of interest. We envision an analytical engine co-optimized with components that enable context-rich analysis. Firstly, as the data from different sources or resulting from model answering cannot be cleaned ahead of time, we propose using online data integration via model-assisted similarity operations. Secondly, we aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators. Thirdly, with increasingly heterogeneous hardware and equally heterogeneous workloads ranging from traditional relational analytics to generative model inference, we envision a system that just-in-time adapts to the complex analytical query requirements. To solve increasingly complex analytical problems, ML offers attractive solutions that must be combined with traditional analytical processing and benefit from decades of database community research to achieve scalability and performance effortless for the end user.

translated by 谷歌翻译

On Text-based Personality Computing: Challenges and Future Directions

Qixiang Fang , Anastasia Giachanou , Ayoub Bagheri , Laura Boeschoten , Erik-Jan van Kesteren , Mahdi Shafiee Kamalabad , Daniel L Oberski

分类：自然语言处理

2022-12-13

Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each challenge, not only do we combine perspectives from both NLP and social sciences, but also offer concrete suggestions towards more valid and reliable TPC research.

translated by 谷歌翻译

Modelling Stance Detection as Textual Entailment Recognition and Leveraging Measurement Knowledge from Social Sciences

Qixiang Fang , Anastasia Giachanou , Ayoub Bagheri

分类：自然语言处理

2022-12-13

Stance detection (SD) can be considered a special case of textual entailment recognition (TER), a generic natural language task. Modelling SD as TER may offer benefits like more training data and a more general learning scheme. In this paper, we present an initial empirical analysis of this approach. We apply it to a difficult but relevant test case where no existing labelled SD dataset is available, because this is where modelling SD as TER may be especially helpful. We also leverage measurement knowledge from social sciences to improve model performance. We discuss our findings and suggest future research directions.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Monitoring ROS2: from Requirements to Autonomous Robots

Ivan Perez , Anastasia Mavridou , Tom Pressburger , Alexander Will , Patrick J. Martin

分类：机器人 | 自然语言处理

2022-09-28

运行时验证（RV）有可能使安全关键系统的安全操作太复杂而无法正式验证，例如机器人操作系统2（ROS2）应用程序。编写正确的监视器本身可能很复杂，监视子系统中的错误威胁着整个任务。本文概述了一种正式的方法，该方法是根据用结构化的自然语言编写的要求为自动驾驶机器人生成运行时监视器的。我们的方法通过OGMA集成工具将正式需求启发工具（FRET）与Copilot（运行时验证框架）集成在一起。 FRET用于用明确的语义指定需求，然后将其自动转化为时间逻辑公式。 OGMA从FRET输出中生成监视规格，该规范已编译为硬实时C99。为了促进ROS2中的显示器的集成，我们已经扩展了OGMA，以生成定义监视节点的ROS2软件包，该节点在新数据可用时运行监视器，并发布任何违规结果。我们方法的目的是将生成的ROS2软件包视为黑匣子，并以最小的努力将它们集成到更大的ROS2系统中。

translated by 谷歌翻译

Constrained self-supervised method with temporal ensembling for fiber bundle detection on anatomic tracing data

Vaanathi Sundaresan , Julia F. Lehman , Sean Fitzgibbon , Saad Jbabdi , Suzanne N. Haber , Anastasia Yendiki

分类：计算机视觉 | 机器学习

2022-08-06

解剖跟踪数据提供了有关脑电路的详细信息，这些信息对于解决扩散MRI拖拉术中的某些常见误差必不可少。然而，由于截断，噪声和伪影的存在以及强度/对比度变化，因此在跟踪数据上对纤维束的自动检测具有挑战性。在这项工作中，我们提出了一种具有自律损失函数的深度学习方法，该方法将基于解剖的损失函数构成了基于解剖学的约束，以准确地分割了猕猴大脑的示踪剂切片上的纤维束。同样，鉴于手动标签的可用性有限，我们使用半监督的培训技术有效地使用未标记的数据来改善性能和位置限制，以进一步降低误报。对不同猕猴的看不见的方法的评估，产生了令人鼓舞的结果，真正的正速率约为0.90。我们方法的代码可从https://github.com/v-sundaresan/fiberbundle_seg_tracing获得。

translated by 谷歌翻译

Data-driven initialization of deep learning solvers for Hamilton-Jacobi-Bellman PDEs

Anastasia Borovykh , Dante Kalise , Alexis Laignelet , Panos Parpas

分类： (统计)机器学习

2022-07-19

与非线性二次调节剂（NLQR）问题相关的汉密尔顿 - 雅各比 - 贝尔曼部分微分方程（HJB PDE）的近似的深度学习方法。首先使用了依赖于州的Riccati方程控制法来生成一个梯度调制的合成数据集，以进行监督学习。根据HJB PDE的残差，最小化损耗函数的最小化成为一个温暖的开始。监督学习和残留最小化的结合避免了虚假解决方案，并减轻了仅监督学习方法的数据效率低下。数值测试验证了所提出的方法的不同优势。

translated by 谷歌翻译

On Specifying for Trustworthiness

Dhaminda B. Abeywickrama , Amel Bennaceur , Greg Chance , Yiannis Demiris , Anastasia Kordoni , Mark Levine , Luke Moffat , Luc Moreau , Mohammad Reza Mousavi , Bashar Nuseibeh

分类：人工智能 | 机器人

2022-06-22

随着自主系统成为我们日常生活的一部分，确保其信任度至关重要。有许多用于证明可信赖性的技术。所有这些技术的共同点是需要阐明规格。在本文中，我们对规格进行了广泛的看法，专注于顶级要求，包括但不限于功能，安全性，安全性和其他非功能性属性。本文的主要贡献是对于与指定可信度相关的自主系统社区的一系列高级智力挑战。我们还描述了有关自主系统的许多应用程序域的独特规范挑战。

translated by 谷歌翻译

Quantum machine learning channel discrimination

Andrey Kardashin , Anna vlasova , Anastasia Pervishko , Dmitry Yudin , Jacob Biamonte

分类：人工智能 | 机器学习

2022-06-20

在量子通道歧视的问题中，人们区分给定数量的量子通道，这是通过通过通道发送输入状态并测量输出状态来完成的。这项工作研究了跨量子电路和机器学习技术的应用，用于区分此类渠道。特别是，我们探讨了（i）将此任务嵌入到变化量子计算的框架中的实际实施，（ii）培训基于变异量子电路的量子分类器，以及（iii）应用量子核估计技术。为了测试这三种通道歧视方法，我们考虑了两种不同的去极化因子的一对纠缠破裂的通道和去极化通道。对于方法（i），我们使用广泛讨论的平行和顺序策略来解决解决量子通道歧视问题。我们在更好地收敛与量量较少的量子资源方面展示了后者的优势。具有变分量子分类器（II）的量子通道歧视即使使用随机和混合输入状态以及简单的变异电路也可以运行。基于内核的分类方法（III）也被发现有效，因为它允许人们区分不仅与去极化因子的固定值相关的去极化通道，而是与其范围相关的。此外，我们发现对一种常用核之一的简单修改显着提高了该方法的效率。最后，我们的数值发现表明，通道歧视的变分方法的性能取决于输出态乘积的痕迹。这些发现表明，量子机学习可用于区分通道，例如代表物理噪声过程的通道。

translated by 谷歌翻译